Skip to content

refactor(dbt): state assessments rebuild#3553

Open
GabyRangelB wants to merge 32 commits intomainfrom
claude/feat/stat-rebuild
Open

refactor(dbt): state assessments rebuild#3553
GabyRangelB wants to merge 32 commits intomainfrom
claude/feat/stat-rebuild

Conversation

@GabyRangelB
Copy link
Copy Markdown
Contributor

@GabyRangelB GabyRangelB commented Mar 30, 2026

Pull Request

Summary & Motivation

When merged, this pull request will rebuild the NJ and FL state assessment reporting layer. The core motivation is to move proficiency banding, demographic labeling, and assessment metadata computation upstream into intermediate models — so that reporting models become thin joins rather than logic-heavy CTEs. It also introduces a new Google Sheets comparison demographics source, consolidates the demographic comps pipeline, and adds several correctness fixes discovered during validation.

`stg_google_sheets__state_test_comparison_demographics`

  • New staging model replacing `stg_google_sheets__state_test_comparison`
  • Adds `custom_rows` CTE to synthesize HS ALG01 totals (aggregated from HS_09 and HS_10 rows, which official comp sources do not provide as a combined HS total)
  • Computes `aligned_level_test_code` (differentiates ALG01_MS from ALG01_HS), `school_level`, `grade_range_band`, `discipline`, and `total_proficient_students` directly in staging

`int_pearson__all_assessments`

  • Pushes proficiency banding (`is_proficient`, `testperformancelevel`), aligned demographic labels (`aligned_aggregate_ethnicity`, `aligned_ml_status`, `aligned_iep_status`), and metadata (`results_type`, `district_state`, `admin`, `aligned_test_code`) upstream from the reporting CTE into the intermediate model
  • New columns: `is_504`, `is_approaching_int`, `is_below_int`

`int_pearson__student_list_report` (new)

  • New intermediate model wrapping the student list report staging
  • Adds `aligned_test_code`, `is_proficient_int`, `results_type = 'Preliminary'`, `admin`, `subject`, and `district_state`

`int_fldoe__all_assessments` (kipptaf)

  • Adds `results_type`, `district_state`, `admin`, `subject`, `illuminate_subject`, `fast_aggregated_proficiency`, and `is_proficient_int` as computed columns so the kipptaf model no longer depends on the kippmiami deployment providing them
  • Fixes `achievement_level_int` sourcing — was broken by a regex on the text field which nulled out FSA records; now correctly reads `performance_level` from the kippmiami source
  • Adds `is_approaching_int` and `is_below_int` for FL parity with NJ
  • Filters PM3 and PM1 sections to `assessment_name = 'FAST'` to prevent EOC and Science records from contaminating the FAST rollup

`int_tableau__state_assessments_demographic_comps`

  • Replaces GROUP BY CUBE (1,024 combos) with explicit GROUPING SETS (12 combos) for ~85× reduction in computed groups
  • Consolidates the demographic comps intermediate chain from 2 models + macro into 1 model
  • Fixes self-join bug that made `region_matched` / `region_outperformed` flags dead columns
  • Adds automatic prelim score gating: `valid_prelim_assessments` CTE auto-suppresses NJ student list data for any (academic_year, assessment_name) already present in `int_pearson__all_assessments` Spring — eliminating the need to manually comment/uncomment the prelim branch each cycle
  • Fixes gate join to use `admin` instead of `season` (matching the actual column name in `int_pearson__all_assessments`)

`rpt_tableau__state_assessments_dashboard`

  • Simplifies reporting layer: removes inline CASE blocks for proficiency banding, subject mapping, and demographic labeling — now reads upstream-computed columns
  • Adds same automatic prelim score gating as the demographic comps model
  • Fixes ALG01 state comps fan-out: `state_comps` CTE now groups by `school_level` instead of `aligned_level_test_code`, and the join adds `e.school_level = c.school_level` — ensures grade 8 students match MS comps and grade 9+ students match HS comps without duplicating rows

`rpt_tableau__state_assessments_dashboard_comps`

  • Updates column references to match renamed columns in the modified demographics staging model

`rpt_tableau__academic_goals_rollup`

  • Swaps `stg_fldoe__fast` ref to `int_fldoe__all_assessments` for FL assessments, consolidating the FAST data path and enabling `is_approaching_int` / `is_below_int` to flow through without inline computation

`base_powerschool__course_enrollments`

  • Adds `standardized_discipline` column

`int_extracts__student_enrollments_subjects`

  • Refactored to use upstream columns rather than recomputing subject mappings inline

New model: `int_extracts__student_enrollments_courses`

YMLs

  • New or updated YML for: `int_pearson__all_assessments`, `int_pearson__student_list_report`, `int_fldoe__all_assessments` (kipptaf), `stg_google_sheets__state_test_comparison_demographics`, `int_tableau__state_assessments_demographic_comps`
  • Added `aligned_gender` to `int_extracts__student_enrollments` YML
  • Uniqueness tests added to stg, int, and rpt models in the comps pipeline

AI Assistance

This PR was largely Claude Code-assisted under human direction. Human directed the feature requirements and data model decisions (which fields to add, how prelim gating should work, the school_level join approach for ALG01, the GROUPING SETS refactor). Claude Code handled implementation, debugging, and row-count validation against production BigQuery tables.

Self-review

General

  • Update due date and assignee on the TEAMster Asana Project
  • Review the Claude Code Review comment posted on this PR. Address valid feedback; dismiss false positives with a brief reply explaining why.

dbt

  • Include a `[model_name].yml` properties file for all new models — see dbt Conventions
  • Include (or update) an exposure for all models consumed by a dashboard, analysis, or application
  • Breaking change? `stg_google_sheets__state_test_comparison` is disabled in favor of the new `stg_google_sheets__state_test_comparison_demographics`. Confirm no remaining refs to the old model.

CI checks

  • Trunk — passes. If it fails, run `trunk check` and `trunk fmt` locally, fix any issues, and push again.
  • dbt Cloud — passes. If it fails: click Details on the check → expand Invoke `dbt build ...`Debug Logs. Or tag Claude on the PR with the error details.
  • Dagster Cloud — passes or not triggered.

Troubleshooting

GabyRangelB and others added 12 commits March 26, 2026 19:05
…c upstream

- Add proficiency banding, subject mapping, and metadata columns to
  int_pearson__all_assessments, int_fldoe__all_assessments, and
  int_iready__diagnostic_results
- Add new int_pearson__student_list_report intermediate model
- Add stg_google_sheets__state_test_comparison_demographics; disable old
  stg_google_sheets__state_test_comparison
- Add standardized_discipline to base_powerschool__course_enrollments
- Add new int_extracts__student_enrollments_courses model
- Refactor int_extracts__student_enrollments_subjects to use upstream columns
- Simplify rpt_tableau__state_assessments_dashboard and _comps by replacing
  inline CASE blocks with upstream column references
- Update int_tableau__state_assessments_demographic_comps lineage to use
  int_pearson__student_list_report

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…g CTE

- Switch state_comps CTE to stg_google_sheets__state_test_comparison_demographics
- Move results_type, admin, season, subject, test_code upstream to int models
- Rename test_code to aligned_test_code in int_pearson__student_list_report
- Add admin and subject aliases to int_fldoe__all_assessments
- Swap stg_pearson__student_list_report ref to int_pearson__student_list_report

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ch flags

Replace GROUP BY CUBE (1,024 combos) with explicit GROUPING SETS (12 combos)
for ~85x reduction in computed groups. Consolidate the demographic comps
intermediate chain from 2 models + macro into 1 model. Push demographic labels,
comparison_entity, and test_code-derived columns upstream into the intermediate
to simplify the reporting layer. Fix self-join bug that made region_matched/
region_outperformed flags dead columns. Add uniqueness tests to stg, int, and
rpt models.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- dim_state_assessment_benchmarks.yml: keep expanded surrogate key description,
  use bare `- unique` test (inherits severity/store_failures from dbt_project.yml
  project-level defaults instead of repeating per-test)
- base_powerschool__course_enrollments: keep both courses_credittype normalization
  (from main) and standardized_discipline (from this branch)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add results_type, district_state, admin, subject, illuminate_subject, and
fast_aggregated_proficiency as computed columns in the kipptaf
int_fldoe__all_assessments model instead of depending on the kippmiami
upstream to provide them. This allows rpt_tableau__state_assessments_dashboard
to build without waiting for a kippmiami deployment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Refactor int_tableau__state_assessments_demographic_comps: replace
  self-contained assessment_scores CTE with three union branches
  (NJ official, NJ prelim, FL official); add test_code_metadata CTE
  from stg_google_sheets__state_test_comparison_demographics to replace
  inline school_level/grade_range_band/discipline CASE statements;
  fix aligned column references and unqualified ON clause columns

- Add YML for int_fldoe__all_assessments (kipptaf): uniqueness test,
  model description, and full column definitions including new metadata
  columns (results_type, district_state, aligned_level_test_code,
  illuminate_subject, fast_aggregated_proficiency, is_proficient_int,
  admin, subject)

- Add YML for int_pearson__all_assessments: uniqueness test, model
  description, full column definitions; fix stale columns
  (englishlearnerel, studentwithdisabilities removed; aligned_*
  demographic columns, is_proficient_int, season, admin added)

- Add YML for int_pearson__student_list_report: uniqueness test on
  (source_relation, academic_year, administration, state_id,
  aligned_test_code), model description, full column definitions;
  fix missing trailing comma in SQL

- Update stg_google_sheets__state_test_comparison_demographics YML:
  move data_tests before columns, remove redundant store_failures,
  add missing aligned_level_test_code column

- Add aligned_gender to int_extracts__student_enrollments YML

- Remove design spec (work complete)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Add prelim score gating to int_tableau__state_assessments_demographic_comps:
  new prelim_assessments and valid_prelim_assessments CTEs automatically
  exclude NJ student list data for any (academic_year, assessment_name)
  already present in int_pearson__all_assessments Spring, eliminating
  the need to manually comment/uncomment the prelim branch each cycle
- Qualify all column references in final SELECT with scores alias (s.)
  to satisfy RF02 after test_code_metadata join was introduced
- Replace rolling 7-year window with fixed 2018 floor across all three
  score branches — 2018 is the earliest year with available comps data
- Fix rpt_tableau__state_assessments_dashboard: inline-alias
  administration_window and assessment_subject from int_fldoe__all_assessments
  instead of expecting pre-computed admin/subject columns

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Strip stg_google_sheets__state_test_comparison_demographics to SELECT *
  so all columns (school_level, grade_range_band, discipline,
  aligned_level_test_code, etc.) come directly from the _v2 sheet range
- Update sources-external.yml with _v2 sheet range

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…hics rename

- Rename test_name → assessment_name, test_code → aligned_test_code in
  dim_state_assessment_benchmarks (SQL + contract YML) and state_comps
  CTE in rpt_tableau__state_assessments_dashboard
- Rename comparison_demographic_group/subgroup_aligned →
  aligned_comparison_demographic_group/subgroup across all three downstream
  models
- Fix join conditions in rpt_tableau__state_assessments_dashboard against
  state_comps CTE (all three score sections)
- Add school_level to uniqueness test on
  stg_google_sheets__state_test_comparison_demographics — differentiates
  MS_HS source rows from synthetic HS ALG01 aggregate rows

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…raphics

- Add custom_rows CTE to union source data with synthetic HS ALG01
  totals (aggregated from HS_09 and HS_10 rows, which official comp
  sources do not provide as a combined HS total)
- Rename derived columns: comparison_demographic_group/subgroup_aligned
  → aligned_comparison_demographic_group/subgroup
- Update aligned_level_test_code derivation to use aligned_test_code
  instead of test_code

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Keep SELECT * from transformations CTE (main's explicit list was
  missing race_ethnicity computed in transformations)
- Add testscorecomplete to union_relations include list, transformations
  CTE, and YML (new column introduced in main)
- Keep full column descriptions from branch; reject main's stripped YML

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@cbini cbini marked this pull request as ready for review April 9, 2026 20:03
@cbini cbini requested review from a team as code owners April 9, 2026 20:03
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

Claude finished @cbini's task in 6m 26s —— View job


PR Review: refactor(dbt): state assessments rebuild

  • Gather context and understand PR scope
  • Read changed files and analyze code
  • Review dbt models and SQL transformations
  • Post comprehensive review feedback

This is a well-reasoned rebuild overall. The GROUPING SETS refactor is sound (~85x reduction vs. CUBE is real), the valid_prelim_assessments self-deactivating gate is a clever design, and the self-join fix for rpt_tableau__state_assessments_dashboard_comps is real and meaningful. Issues below are organized by severity.


Critical

1. Broken regexp_extract produces wrong aligned_test_code for grades 3-9

src/dbt/kipptaf/models/pearson/intermediate/int_pearson__student_list_report.sql lines 47-50:

when test_name like '%Mathematics%'
then concat('MAT', regexp_extract(test_name, r'.{6}(.{2})'))
when test_name like '%ELA%'
then concat('ELA', regexp_extract(test_name, r'.{6}(.{2})'))

r'.{6}(.{2})' extracts characters 7-8 of the test name. For "Grade 03 Mathematics" (double-digit), chars 7-8 = "03"MAT03 ✓. For "Grade 3 Mathematics" (single-digit), chars 7-8 = " M"MAT M ✗. Grades 3-9 (the majority of state assessment grades) would produce wrong codes. The correct regex is r'Grade (\d+)'.

When aligned_test_code is wrong or NULL (from the missing else), downstream joins to test_code_metadata and valid_prelim_assessments fail silently — rows still count in aggregates but with NULL school_level/grade_range_band/discipline.


Important

2. Dead NJ prelim UNION ALL branch in rpt_tableau__state_assessments_dashboard.sql

Lines 476-591 filter a.results_type = 'Preliminary' in the JOIN. But the assessment_scores CTE (lines 163-234) has its prelim section commented out (/* disabled until next december */), so no rows carry results_type = 'Preliminary'. This branch produces zero rows every run. The current asymmetry — one block commented out, the other live — is the actual bug. Either comment out the UNION ALL branch with the same annotation, or re-enable both together.

3. int_pearson__all_assessments contract enforcement disabled with no tracking

The PR body acknowledges this, but the YML contains no TODO, issue link, or comment. The model is a direct parent of extracts/ models (which have enforced contracts). At minimum, a # TODO: re-enable contract enforcement — see #<issue> comment should be added to the properties file before merge.

4. Race/ethnicity source asymmetry across union branches

int_tableau__state_assessments_demographic_comps.sql:

  • NJ official: uses a.aligned_aggregate_ethnicity — race from Pearson assessment file
  • NJ prelim: uses inline CASE on e.race_ethnicity — race from PowerSchool enrollment
  • FL official: same inline CASE on e.race_ethnicity — race from enrollment

The inline CASE in prelim/FL also lacks an else clause, so any unrecognized race_ethnicity code returns NULL instead of a labeled bucket. Cross-year comparisons for a student can land them in different demographic buckets depending on which branch processes them.

5. valid_prelim_assessments gate relies on unvalidated string match

Lines 42-52: pa.test_type = p.assessment_name must match exactly for the gate to work (e.g., Dagster partition key 'NJSLA' must equal the hardcoded assessment_name = 'NJSLA' in official records). If the naming diverges, prelim data silently flows through even when official scores exist. Needs either an accepted_values test on test_type in the staging model or a comment documenting the coupling.

6. YML: Missing description: on model and all columns — three files

Per src/dbt/CLAUDE.md: "All new or modified models require description: on the model and every column":

  • int_tableau__state_assessments_demographic_comps.yml — no model description, no column descriptions
  • rpt_tableau__state_assessments_dashboard_comps.yml — check for missing descriptions on modified columns
  • The intermediate YML is missing descriptions entirely

7. YML: store_failures: true on individual tests — two files

Per convention: "Do not add store_failures: true to individual tests — the project default handles it."

  • int_tableau__state_assessments_demographic_comps.yml line 47
  • rpt_tableau__state_assessments_dashboard_comps.yml — same pattern

Both instances should be removed.


Minor

8. ALG01 HS aggregation sums percentages instead of weighted average (pre-existing, now load-bearing)

stg_google_sheets__state_test_comparison_demographics.sql lines 41-43:

sum(percent_proficient) as percent_proficient,
sum(total_students) as total_students,

This sums two percentage values (HS_09 + HS_10) rather than computing a weighted average. The outer SELECT then multiplies percent_proficient * total_students to get total_proficient_students — doubly wrong. Pre-existing, but now directly consumed by the refactored comps pipeline. Should be captured in a tracked issue.

9. Region self-join region_matched = true — undocumented tautology

rpt_tableau__state_assessments_dashboard_comps.sql lines 137-149: When a.comparison_entity = 'Region', the self-join b.comparison_entity = 'Region' matches the same row, so region_matched = true always. Likely intentional but needs a comment to prevent future confusion.


Summary

# File Category
1 int_pearson__student_list_report.sql — broken regex for grades 3-9 Critical
2 rpt_tableau__state_assessments_dashboard.sql — dead UNION ALL branch Important
3 int_pearson__all_assessments.yml — contract enforcement disabled, no tracking Important
4 int_tableau__state_assessments_demographic_comps.sql — race source asymmetry Important
5 int_tableau__state_assessments_demographic_comps.sql — gate relies on unvalidated string match Important
6 Three YML files — missing model/column descriptions Important
7 Two YML files — store_failures: true on individual tests Important
8 stg_google_sheets__state_test_comparison_demographics.sql — percent summing Minor
9 rpt_tableau__state_assessments_dashboard_comps.sql — undocumented self-join tautology Minor

Issues 1, 2, and 3 should be resolved before merge. Issues 4-7 should either be fixed or tracked in a follow-up issue with a reference added to the code.

GabyRangelB and others added 9 commits April 10, 2026 14:02
…nt, fix prelim gating, fix FAST source and ALG01 state comps fan-out

- int_pearson__all_assessments: add is_504, is_approaching_int, is_below_int columns
- int_fldoe__all_assessments (ktaf): fix achievement_level_int via performance_level, add is_approaching_int/is_below_int; filter PM3/PM1 to assessment_name = 'FAST' to prevent EOC/Science contamination
- rpt_tableau__academic_goals_rollup: swap stg_fldoe__fast ref to int_fldoe__all_assessments
- rpt_tableau__state_assessments_dashboard: add prelim score gating (auto-suppresses when official scores exist); fix state_comps join to use school_level instead of aligned_level_test_code to resolve ALG01 MS/HS fan-out
- int_tableau__state_assessments_demographic_comps: fix prelim gate join to use admin instead of season; fix same ALG01 fan-out via admin column
- rpt_tableau__state_assessments_dashboard_comps: update column refs to match renamed demographics staging columns

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…; remove store_failures from individual tests

- int_pearson__student_list_report: replace .{6}(.{2}) regex with
  Grade (\d+) + lpad — fixes wrong aligned_test_code for grades 3-9
  where single-digit grade names caused positional extraction to capture
  a space character instead of the grade number
- int_tableau__state_assessments_demographic_comps.yml,
  rpt_tableau__state_assessments_dashboard_comps.yml: remove
  store_failures: true from individual uniqueness tests — project
  default handles this

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@GabyRangelB
Copy link
Copy Markdown
Contributor Author

@claude - update your review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

Claude finished @GabyRangelB's task in 2m 54s —— View job


PR Review: refactor(dbt): state assessments rebuild

  • Gather context and understand PR scope
  • Read changed files and check prior review issues
  • Post updated review feedback

Updated review — all 9 prior issues re-checked against current branch.


Resolved since last review

1. Broken regexp_extract for grades 3-9 — Fixed ✓
int_pearson__student_list_report.sql lines 46–55 now uses r'Grade (\d+)' with lpad(..., 2, '0'). Single-digit grades (3–9) now produce correct codes (MAT03, ELA09, etc.).

2. Dead UNION ALL branch in dashboard — Fixed ✓
The commented-out prelim section in assessment_scores has been removed. Both the prelim UNION ALL in the CTE and the inner join valid_prelim_assessments gate in the NJ prelim SELECT are now live and consistent.

3. Contract enforcement disabled with no tracking — Moot ✓
int_pearson__all_assessments is an intermediate model. Per src/dbt/CLAUDE.md, only rpt_, dim_*, fct_*, and staging models require contract: enforced: true. No tracking needed.

7. store_failures: true on individual tests — Fixed ✓
Removed from both int_tableau__state_assessments_demographic_comps.yml and rpt_tableau__state_assessments_dashboard_comps.yml.


Still open

4. Race/ethnicity source asymmetry across union branches — Still present

int_tableau__state_assessments_demographic_comps.sql:

  • NJ official (lines 82–83): reads a.aligned_aggregate_ethnicity — from the Pearson assessment file
  • NJ prelim (lines 122–138) and FL official (lines 190–207): inline CASE on e.race_ethnicity — from PowerSchool enrollment

The inline CASE handles null → 'Blank' but has no else clause — any non-null, unrecognized code returns NULL, not 'Blank'. Cross-year comparisons can put the same student in different demographic buckets depending on which branch processes them.

5. valid_prelim_assessments gate relies on unvalidated string match — Still present

Both rpt_tableau__state_assessments_dashboard.sql (line 34) and int_tableau__state_assessments_demographic_comps.sql (line 50):

and pa.test_type = p.assessment_name

If the Dagster test_type partition key diverges from the assessment_name string in official records (e.g. 'NJSLA' vs 'NJSLA-S'), prelim data silently flows through even when official scores exist. Needs either an accepted_values test on test_type in the staging model or a comment documenting the coupling.

6. Missing description: on model and all columns — two YML files — Partially fixed; two files remain

Per src/dbt/CLAUDE.md: "All new or modified models require description: on the model and every column."

  • int_pearson__student_list_report.yml — now has full descriptions ✓
  • int_tableau__state_assessments_demographic_comps.yml — no model description, no column descriptions ✗
  • rpt_tableau__state_assessments_dashboard_comps.yml — no model description, no column descriptions ✗

Minor (pre-existing, tracked)

8. ALG01 HS aggregation sums percentages instead of weighted average — Still present

stg_google_sheets__state_test_comparison_demographics.sql lines 55–56:

sum(percent_proficient) as percent_proficient,
sum(total_students) as total_students,

percent_proficient is the sum of HS_09 + HS_10 percentages, not a weighted average. The outer SELECT at line 86 then multiplies percent_proficient * total_students to compute total_proficient_students, compounding the error. Pre-existing, but now load-bearing. Should be tracked in an issue if not already.

9. Region self-join region_matched = true — undocumented tautology — Still present

rpt_tableau__state_assessments_dashboard_comps.sql lines 137–149: When a.comparison_entity = 'Region', b (filtered to comparison_entity = 'Region') matches the same row, so region_matched is always true for Region rows. Likely intentional but should have a comment to prevent future confusion.


Summary

# Issue Status
1 int_pearson__student_list_report.sql — broken regex Fixed
2 rpt_tableau__state_assessments_dashboard.sql — dead UNION ALL branch Fixed
3 int_pearson__all_assessments.yml — contract enforcement concern Moot
4 int_tableau__state_assessments_demographic_comps.sql — race source asymmetry Still open
5 int_tableau__state_assessments_demographic_comps.sql — unvalidated gate string match Still open
6 Two YML files — missing model/column descriptions Partially fixed (2 of 3 remain)
7 Two YML files — store_failures: true on individual tests Fixed
8 stg_google_sheets__state_test_comparison_demographics.sql — percent summing Still present (minor/pre-existing)
9 rpt_tableau__state_assessments_dashboard_comps.sql — undocumented tautology Still present (minor)

Issues 4–6 should be resolved or tracked before merge. Issues 8–9 are minor and pre-existing.

GabyRangelB and others added 2 commits April 16, 2026 16:22
…/delete conflict

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@GabyRangelB GabyRangelB requested review from a team as code owners April 17, 2026 13:12
GabyRangelB and others added 2 commits April 17, 2026 13:16
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…from int_pearson and int_fldoe

Replace inline subject alignment, test code normalization, proficiency flag
computations, and period mapping in downstream consumers of
int_pearson__all_assessments and int_fldoe__all_assessments with the
pre-computed columns now available on both models.

Also adds model + column descriptions to int_tableau__state_assessments_demographic_comps.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@GabyRangelB
Copy link
Copy Markdown
Contributor Author

Downstream model changes — notice to model owners

The new columns added to int_pearson__all_assessments and int_fldoe__all_assessments made redundant several inline transformation expressions in downstream models. Those models were updated in this PR to remove the duplication. If you own one of the models below, please review the change summary.

Each change was validated by comparing output row counts and metric values against production BigQuery before and after — zero data differences in all cases except where noted.


@awalters — Anthony Walters

rpt_tableau__academic_goals_rollup

What changed: Three branches of state_test_union were computing a subject remap inline. All three now use the pre-computed illuminate_subject column:

Removed Replaced with
case when subject like 'English%' then 'Reading' ... (NJ) if(illuminate_subject = 'Text Study', 'Reading', 'Math')
if(assessment_subject = 'English Language Arts', 'Reading', 'Math') (FL ×2) if(illuminate_subject = 'Text Study', 'Reading', 'Math')

Validation: Math/Reading row counts unchanged. 16-row total diff predates this change (FL/iReady/Star upstream data).

Next steps: is_proficient_int, is_approaching_int, is_below_int already read directly from upstream — no further action needed.


rpt_tableau__miami_fast

What changed: is_proficient was a FLOAT64 computed inline. Now reads is_proficient_int (INT64) directly; contract YAML updated to match:

Removed Replaced with
case ft.is_proficient when true then 1.0 when false then 0.0 end as is_proficient ft.is_proficient_int as is_proficient
data_type: float64 in YML data_type: int64

Validation: Zero diff at 39,427 rows. Tableau treats INT64 and FLOAT64 identically for numeric fields.

Next steps: The remaining if(... and fl.is_proficient ...) boolean checks in the disparity/growth CTEs cannot be replaced with is_proficient_int directly, but fast_aggregated_proficiency, sublevel_number, scale_for_proficiency, and points_to_proficiency are now all pre-computed on int_fldoe__all_assessments and may simplify those CTEs further.


rpt_gsheets__kippmiami_payout_roster (currently disabled)

What changed: Two aggregation expressions recomputed is_proficient_int inline:

Removed Replaced with
round(avg(if(is_proficient, 1, 0)), 2) ×2 round(avg(is_proficient_int), 2) ×2

Validation: Pre-change query confirmed equivalence for all groups — zero mismatches.

Next steps when re-enabling: fast_aggregated_proficiency, sublevel_number, scale_for_proficiency, points_to_proficiency, scale_for_growth, and points_to_growth are now pre-computed on int_fldoe__all_assessments and may simplify the growth and disparity CTEs.


@cbini — Charlie Bini

int_topline__state_assessments_weekly

What changed: Both branches computed is_proficient_int inline. Both now read the column directly:

Removed Replaced with
case when fl.is_proficient then 1 when not fl.is_proficient then 0 end (FL) fl.is_proficient_int
case when p.is_proficient then 1 when not p.is_proficient then 0 end (NJ) p.is_proficient_int

Validation: Proficient / not-proficient / total counts by region (Camden, Miami, Newark) — zero diff.


dim_assessments

What changed: The state_nj CTE re-derived subject area and test code inline:

Removed Replaced with
if(subject = 'English Language Arts/Literacy', 'English Language Arts', subject) aligned_subject
case testcode when 'SC05' then 'SCI05' ... end aligned_test_code

Validation: Zero diff across all 24 subject/module combinations.


fct_assessment_scores_enrollment_scoped

What changed: The state_nj CTE computed subject alignment, test code normalization, and period mapping inline:

Removed Replaced with
if(subject = 'English Language Arts/Literacy', 'English Language Arts', subject) aligned_subject
case testcode when 'SC05' then 'SCI05' ... end aligned_test_code
if(period = 'FallBlock', 'Fall', period) ×2 admin

Validation: Zero diff across all 23 subject/module combinations. Surrogate key stability confirmed (administration_window values unchanged).

Next steps: The state_fl CTE could be similarly simplified — int_fldoe__all_assessments now exposes is_proficient_int, is_approaching_int, is_below_int, and illuminate_subject directly.


Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@GabyRangelB
Copy link
Copy Markdown
Contributor Author

@claude please update your review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 17, 2026

Claude finished @GabyRangelB's task in 1m 51s —— View job


PR Review: refactor(dbt): state assessments rebuild

  • Gather context and understand PR scope
  • Read changed files and check prior review issues
  • Post updated review feedback

Updated review — all 9 prior issues re-checked against current branch.


Resolved since last review

4. Race/ethnicity source asymmetry — ml_status leg — Partially fixed ✓
The NJ prelim branch now reads e.ml_status directly (line 119) rather than recomputing it inline, closing the aligned_ml_status vs inline-CASE asymmetry for that field. The aggregate_ethnicity asymmetry and the missing else described below remain.

6. Missing model description — int_tableau__state_assessments_demographic_comps.yml — Fixed ✓
Model description and all column descriptions are now present. Only one file remains.


Still open

4. Race/ethnicity source asymmetry and missing else clause — Still present

int_tableau__state_assessments_demographic_comps.sql:

  • NJ official (line 82): reads a.aligned_aggregate_ethnicity — from the Pearson assessment file
  • NJ prelim (lines 122–139) and FL official (lines 190–207): inline CASE on e.race_ethnicity — from PowerSchool enrollment

Both inline CASE blocks end with when e.race_ethnicity is null then 'Blank' but have no else clause — any non-null, unrecognized code returns NULL instead of 'Blank'. Cross-year comparisons can land the same student in different demographic buckets depending on which branch processes them.

Fix this →

5. valid_prelim_assessments gate relies on unvalidated string match — Still present

Both rpt_tableau__state_assessments_dashboard.sql (line 34) and int_tableau__state_assessments_demographic_comps.sql (line 50):

and pa.test_type = p.assessment_name

If the Dagster test_type partition key diverges from the assessment_name string in official records, prelim data silently flows through even when official scores exist. Needs either an accepted_values test on test_type in the staging model or a comment documenting the expected coupling.

6. Missing model and column descriptions — rpt_tableau__state_assessments_dashboard_comps.yml — Still open

Per src/dbt/CLAUDE.md: "All new or modified models require description: on the model and every column." The file has only data_type: entries — no description: on the model or any of its 17 columns.

Fix this →


Minor (pre-existing, tracked)

8. ALG01 HS aggregation sums percentages instead of weighted average — Still present

stg_google_sheets__state_test_comparison_demographics.sql line 57:

sum(percent_proficient) as percent_proficient,

This sums HS_09 + HS_10 percentages rather than computing a weighted average. The outer SELECT at line 89 then multiplies percent_proficient * total_students to compute total_proficient_students, compounding the error. Pre-existing and now load-bearing. Should be tracked in an issue if not already.

9. Region self-join region_matched = true — undocumented tautology — Still present

rpt_tableau__state_assessments_dashboard_comps.sql lines 132–151: When a.comparison_entity = 'Region', the join b.comparison_entity = 'Region' matches the same row, so region_matched is always true for Region rows. Likely intentional but should have a comment to prevent future confusion.


Summary

# Issue Status
1 int_pearson__student_list_report.sql — broken regex Fixed
2 rpt_tableau__state_assessments_dashboard.sql — dead UNION ALL branch Fixed
3 int_pearson__all_assessments.yml — contract enforcement concern Moot
4 int_tableau__state_assessments_demographic_comps.sql — race source asymmetry + missing else Partially fixed (ml_status ✓; aggregate_ethnicity still open)
5 Two models — unvalidated gate string match Still open
6 rpt_tableau__state_assessments_dashboard_comps.yml — missing descriptions Still open (1 of 3 fixed ✓)
7 Two YML files — store_failures: true on individual tests Fixed
8 stg_google_sheets__state_test_comparison_demographics.sql — percent summing Still present (minor/pre-existing)
9 rpt_tableau__state_assessments_dashboard_comps.sql — undocumented tautology Still present (minor)

Issues 4 (missing else), 5, and 6 should be resolved or tracked before merge.

…est_type coupling comment, and comps yml descriptions

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

@claude please update your review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 17, 2026

Claude finished @GabyRangelB's task in 2m 23s —— View job


PR Review: refactor(dbt): state assessments rebuild

  • Gather context and understand PR scope
  • Read changed files and check prior review issues
  • Post updated review feedback

Updated review — all prior issues re-checked against current branch.


Resolved since last review

6. Missing model and column descriptions — rpt_tableau__state_assessments_dashboard_comps.yml — Fixed ✓

Full model description and all 16 column descriptions are now present. All three YML files from the original review now have complete descriptions.


Still open

4. Race/ethnicity source asymmetry — missing else clause — Partially fixed

The accepted_values warn test on comparison_demographic_subgroup in int_tableau__state_assessments_demographic_comps.yml (lines 69–83) is a good improvement — it will alert when an unexpected race code produces NULL rather than 'Blank'. However, the underlying CASE blocks in the SQL still lack an else clause, so any non-null, unrecognized race_ethnicity code still silently produces NULL in both the NJ prelim branch (lines 122–139) and FL official branch (lines 190–207). The accepted_values test will warn, but NULL rows will still appear in aggregates until someone investigates. Adding else 'Blank' closes this gap permanently.

Fix this →

5. valid_prelim_assessments gate coupling comment — Partially fixed

int_tableau__state_assessments_demographic_comps.sql now has the comment at line 50 (-- test_type in prelim data matches assessment_name in official records) ✓. However, rpt_tableau__state_assessments_dashboard.sql line 34 has the identical pa.test_type = p.assessment_name join condition with no comment. For consistency and to prevent the same debugging confusion in the other model, that line should also carry the coupling comment.

Fix this →


Minor (pre-existing, tracked)

8. ALG01 HS aggregation sums percentages instead of weighted average — Still present

stg_google_sheets__state_test_comparison_demographics.sql line 57:

sum(percent_proficient) as percent_proficient,

This sums HS_09 + HS_10 percentages rather than a weighted average. The outer SELECT at line 89 then multiplies percent_proficient * total_students to compute total_proficient_students, compounding the error. Pre-existing and load-bearing; should be tracked in an issue if not already.

9. Region self-join region_matched = true — undocumented tautology — Still present

rpt_tableau__state_assessments_dashboard_comps.sql lines 139–151: When a.comparison_entity = 'Region', the join on b.comparison_entity = 'Region' matches the same row, making region_matched always true for Region rows. Likely intentional but a brief comment would prevent future confusion.


Summary

# Issue Status
1 int_pearson__student_list_report.sql — broken regex Fixed
2 rpt_tableau__state_assessments_dashboard.sql — dead UNION ALL branch Fixed
3 int_pearson__all_assessments.yml — contract enforcement concern Moot
4 int_tableau__state_assessments_demographic_comps.sql — missing else 'Blank' Partially fixed (accepted_values warn test ✓; else clause still missing)
5 Gate coupling comment Partially fixed (comps model ✓; dashboard SQL still missing comment)
6 rpt_tableau__state_assessments_dashboard_comps.yml — missing descriptions Fixed
7 Two YML files — store_failures: true on individual tests Fixed
8 stg_google_sheets__state_test_comparison_demographics.sql — percent summing Still present (minor/pre-existing)
9 rpt_tableau__state_assessments_dashboard_comps.sql — undocumented tautology Still present (minor)

Issues 4 and 5 are the only open blockers. Both are one-line fixes. Issues 8–9 are minor and pre-existing.
·

…sments_dashboard

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

Issue 5 fixed. On issue 4: we intentionally left out else 'Blank' — the accepted_values warn test is the desired behavior. If a new race_ethnicity code appears that isn't in the CASE, we want to know about it so we can explicitly add it to the mapping. Silently falling through to 'Blank' would hide the new code and we'd never know to handle it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant